Building Highly-Optimized, Low-Latency Pipelines for Genomic Data Analysis
نویسندگان
چکیده
Next-generation sequencing has transformed genomics into a new paradigm of data-intensive computing. The deluge of genomic data needs to undergo deep analysis to mine biological information. Deep analysis pipelines often take days to run, which entails a long cycle for algorithm and method development and hinders future application for clinic use. In this project, we aim to bring big data technology to the genomics domain and innovate in this new domain to revolutionize its data crunching power. Our work includes the development of a deep analysis pipeline, a parallel platform for pipeline execution, and a principled approach to optimizing the pipeline. We also present some initial evaluation results using existing long-running pipelines at the New York Genome Center, as well as a variety of real use cases that we plan to build in the course of this project.
منابع مشابه
RDFPRO: an extensible tool for building stream-oriented RDF processing pipelines
We present RDFPRO (RDF Processor), an open source Java command line tool and embeddable library that offers a suite of stream-oriented, highly optimized processors for common tasks such as data filtering, RDFS inference, smushing and statistics extraction. RDFPRO processors are extensible by users and can be freely composed to form complex pipelines to efficiently process RDF data in one or mor...
متن کاملAnalysis and Optimization using Renewable Energies to Get Net-Zero Energy Building for Warm Climate
Due to low energy price, economic optimization of consumption has no justification for users in Iran. Nowadays, the issue of ending fossil fuels, production of greenhouse gases and the main role of building in consumption of considerable amount of energy has drawn the focus of global researches to a new concept called net zero energy building. In this study, modeling, simulation and energy anal...
متن کاملمقایسه روش های مختلف آماری در انتخاب ژنومی گاوهای هلشتاین
Genomic selection combines statistical methods with genomic data to predict genetic values for complex traits. The accuracy of prediction of genetic values in selected population has a great effect on the success of this selection method. Accuracy of genomic prediction is highly dependent on the statistical model used to estimate marker effects in reference population. Various factors such a...
متن کاملReliableGenome: annotation of genomic regions with high/low variant calling concordance
MOTIVATION The increasing adoption of clinical whole-genome resequencing (WGS) demands for highly accurate and reproducible variant calling (VC) methods. The observed discordance between state-of-the-art VC pipelines, however, indicates that the current practice still suffers from non-negligible numbers of false positive and negative SNV and INDEL calls that were shown to be enriched among disc...
متن کاملUber: Utilizing Buffers to Simplify NoCs for Hundreds-Cores
Approaching ideal wire latency using a network-on-chip (NoC) is an important practical problem for many-core systems, particularly hundreds-cores. Although other researchers have focused on optimizing large meshes, bypassing or speculating router pipelines, or creating more intricate logarithmic topologies, this paper proposes a balanced combination that trades queue buffers for simplicity. Pre...
متن کامل